\section{Software Design} \label{design}

The TAO design philosophy uses object-oriented techniques of data and
state encapsulation, abstract classes, and limited inheritance to
create a flexible optimization toolkit.  This section provides a short
introduction to our design philosophy by describing the objects needed
to create GPCG and the importance of this design.   

At the algorithmic level, we use objects such as matrices, vectors, 
index sets, and linear solvers.  
Our current implementation leverages the parallel computing
and linear algebra infrastructure offered by 
PETSc~\cite{petsc,PETSc-user-ref},
which employs MPI \cite{using-mpi} for all interprocessor communication.
In this context, a vector ({\texttt Vec}) is an abstraction of an
array of values that represent a discrete field, and a matrix
({\texttt Mat}) represents a discrete linear operator that maps
between vector spaces.  An index set ({\texttt IS}) is a
generalization of a set of integer indices, which can be used for
selecting, gathering, and scattering subsets of vector and matrix
elements.  TAO also interfaces to the preconditioned conjugate gradient 
method and other linear solvers within
PETSc.  Through the \texttt{SLES} component of PETSc, users can define an iterative method,
its preconditioner, and the solution tolerance.
Because each of these objects has several underlying 
representations, TAO has easy access to a variety of parallel vector
and sparse matrix implementations as well as preconditioners and
Krylov subspace methods.
% Functional support for the objects includes the
% creation, duplication, and destruction of the objects.

With sufficiently flexible 
abstract interfaces, TAO can support a variety of implementations of 
data structures and algorithms.  These abstractions allow us
to more easily experiment with a range of algorithmic and data
structure options for realistic problems, such as within this case
study.  Such capabilities are critical for making high-performance
optimization software adaptable to the continual evolution of parallel
and distributed architectures and the research community's discovery
of new algorithms that exploit their features.

The interface to TAO uses only these objects, and a
context variable called \texttt{TAO\_SOLVER}, which encapsulates
information about the solution process, including the algorithm,
convergence tolerances, options, and parameters.  All of the
computations and communications related to a particular solution
process are managed in the solver context variable.  This context
must be created through the  \texttt{TaoCreate} routine, which
specifies the optimization method (denoted by
\texttt{TaoMethod}) and the MPI communicator, which specifies the
processes involved in the optimization computations.

To define the problem, the routine 
\texttt{TaoSetQuadraticFunction} in Figure \ref{tao_interface}
sets the objective
function (\ref{def-quadratic}) in terms of the \texttt{Mat} object \texttt{A},
\texttt{Vec} object \texttt{B}, and scalar \texttt{c}
and provides the \texttt{Vec} objects \texttt{X} and \texttt{G} 
that are used for the solution and gradient.
The function
\texttt{TaoSetVariableBounds} defines upper and lower bounds for the variables
\texttt{X} with the \texttt{Vec} objects \texttt{XL} and \texttt{XU}.
Users working in a parallel environment must provide TAO with data structures
\texttt{A}, \texttt{B}, \texttt{X}, \texttt{G}, \texttt{XL}, and \texttt{XU} 
that are properly distributed over the processors.  
Appropriate distribution allows efficient executions of the 
matrix-vector multiplication, vector inner product, and vector \texttt{saxpy}
operations.
Numerical toolkits such as PETSc facilitate the creation of these objects and
provide the functionality for most of the required numerical operations.

After defining
the optimization problem, the user then calls \texttt{TaoSolve} to
determine the solution.  Finally, the user destroys the TAO solver via
\texttt{TaoDestroy}.  The code fragment in Figure \ref{tao_interface}
shows the main functions needed to solve bound-constrained quadratic
programming problems with TAO.

\begin{figure}[htb]
\medskip
\begin{alltt}
  TaoCreate(MPI_Comm comm,TaoMethod method,TAO_SOLVER *tao); 
  TaoSetQuadraticFunction(TAO_SOLVER tao,Vec X,Vec G,Mat A,Vec B,double c);
  TaoSetVariableBounds(TAO_SOLVER tao,Vec XL,Vec XU);
  TaoSolve(TAO_SOLVER tao);
  TaoDestroy(TAO_SOLVER tao);
\end{alltt}
\caption{TAO interface for bound-constrained quadratic problems\label{tao_interface}}
\end{figure}

This interface serves
several algorithms for bound-constrained quadratic problems in
addition to GPCG, including limited memory variable metric, trust
region Newton, and interior point techniques.  Moreover, this single
interface serves other types of optimization problems as well.
Additional routines may be used to specify the starting point
and various options for the optimization solver,
but the structure in  Figure \ref{tao_interface} is needed in all cases.
Detailed information can be found
in the TAO User Guide \cite{tao-user-ref}.

TAO implements the GPCG algorithm as a sequence of well-defined operations.
The operations required to implement the GPCG algorithm as outlined
in Section \ref{alg} include
the vector and matrix operations listed in the preceding paragraph,
functions to compute
the pointwise minimum and maximum of two vectors,
and a function that creates an
index set that defines the indices where the elements
of two vectors are equal.
The evaluation of the function and gradient of
the quadratic $q$,
for instance, can be implemented through the standard
numerical operations of matrix-vector 
multiplication, vector inner product, and vector \texttt{saxpy}.
TAO passes \texttt{Mat} and \texttt{Vec} objects, whose
representation is independent of our implementation of GPCG, to 
external tools
that perform the numerical computations.
Additional work vectors required by the algorithm 
are created by calling a routine that
clones the variable vector \texttt{X} in Figure \ref{tao_interface}.



%  At each iterate of the GPCG algorithm, a gradient projection method is 
%  used to reduce the objective function and projected gradient is used several steps in the direction of the
%  projected gradient 
%  The gradient projection algorithm continues until either 
%  the set of free variables does not change, there is insufficient decrease
%  in the objective value, or the algorithm exceeds 
%  a maximum number of iterations.  The operations used by this method include
%  the vector and matrix operations listed in the previous paragraph, as 
%  well as a binary functions that computes a vector defined by the
%  pointwise minimum of two vectors,
%  a vector defined by the pointwise maximum of two vectors, 
%  and an index set indicating which elements
%  of two vectors equal one another.


At each iteration of the GPCG algorithm, we also need to apply
the conjugate gradient method to the matrix $ A_k $ corresponding
to the free variables.
This is an important phase of the computation because,
as we shall see in Section \ref{sec:performance}, at least $70\%$ of
the GPCG computing time is due to the conjugate gradient method.
We end this section by discussing the implementation of
the conjugate gradient method for solving the
reduced problem in the free variables.

At least two techniques exist for applying the conjugate gradient
method to the reduced system of equations.
One technique creates a second matrix $A_k$ that contains the
rows and columns of $A$ corresponding to the free variables,
and then applies the conjugate gradient method to the reduced system.
An alternative technique applies the conjugate gradient method
to the rows and columns of the full matrix $A$ specified by the
index set of the free variables.
In our implementation, we chose the first method.  Despite the
additional memory requirements and cost of copying data,
this method is simpler,
facilitates the preconditioning and load-balancing of the reduced matrix,
and was easily implemented with the utilities provided by PETSc.

In a parallel environment,
an efficient parallel implementation of the conjugate gradient method
requires that the reduced matrix $A_k$ 
be evenly distributed over the processors,
but since the set of free variables may not be well distributed over the
processors, the reduced matrix may not well distributed---regardless of
how the matrix $A$ is distributed.
Since an unbalanced load can result in tremendous losses in
performance, a redistribution of the rows of $A_k$ over the processors
may be necessary.

In the entire implementation of GPCG, no assumptions are made about
the representations of data in the vectors and matrices.  This
approach eliminates some of the barriers in using independently
developed software components by accepting data that is independent of
representation and interfacing to numerical routines with the appropriate
data formats.  This design enabled us to test the GPCG solver
using several matrix formats and preconditioners without modifying the
solver.


